Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models

نویسندگان

Norbert Fuhr

Chris Buckley

چکیده

We describe the application of probabilistic indexing and retrieval methods to the TREC material. For document indexing, we apply a description-oriented approach which uses relevance feedback information from previous queries run on the same collection. This method is also very exible w.r.t. the underlying document representation. In our experiments, we consider single words and phrases and use polynomial functions for mapping the statistical parameters of these terms onto probabilistic indexing weights. Based on these weights, a linear (utility-theoretic) retrieval function is applied when no relevance feedback data is available for the speciic query. Otherwise, the retrieval-with-probabilistic-indexing model can be used. The experimental results show excellent performance in both cases, but also indicate possible improvements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Models for retrieval with probabilistic indexing

in this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing @II) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of the probability of relevance of this document with respect to queries ...

متن کامل

Probabilistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection

In this paper, we describe the application of probabilis-tic models for indexing and retrieval with the TREC-2 collection. This database consists of about a million documents (2 gigabytes of data) and 100 queries (50 routing and 50 adhoc topics). For document indexing, we use a description-oriented approach which exploits relevance feedback data in order to produce a probabilis-tic indexing wit...

متن کامل

Okapi Chinese Text Retrieval Experiments at TREC-6

The focus of the Okapi TREC{6 Chinese experiments is on investigating the e ectiveness of di erent automatic indexing methods and phrase weighting for retrieval based on probabilistic models over Chinese text. We compare di erent probabilistic weighting methods based on a range of word and single character approaches. There are two indexing methods used in our experiments. One indexing method i...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل